Incorporating Advice into Agents that Learn from Reinforcements
نویسندگان
چکیده
Learning from reinforcements is a promising approach for creating intelligent agents. However, reinforcement learning usually requires a large number of training episodes. We present an approach that addresses this shortcoming by allowing a connectionist Q-learner to accept advice given, at any time and in a natural manner, by an external observer. In our approach, the advice-giver watches the learner and occasionally makes suggestions, expressed as instructions in a simple programming language. Based on techniques from knowledge-based neural networks, these programs are inserted directly into the agent’s utility function. Subsequent reinforcement learning further integrates and refines the advice. We present empirical evidence that shows our approach leads to statistically-significant gains in expected reward. Importantly, the advice improves the expected reward regardless of the stage of training at which it is given. Introduction A successful and increasingly popular method for creating intelligent agents is to have them learn from reinforcements (Barto, Sutton, & Watkins 1990; Lin 1992; Mahadevan & Connell 1992). However, these approaches suffer from their need for large numbers of training episodes. While several approaches for speeding up reinforcement learning have been proposed, a largely unexplored approach is to design a learner that can also accept advice from an external observer. We present and evaluate an approach for creating advicetaking learners. To illustrate the general idea of advice-taking, imagine that you are watching an agent learning to play some video game. Assume you notice that frequently the agent loses because it goes into a “box canyon” in search of food and then gets trapped by its opponents. One would like to give the learner advice such as “do not go into box canyons when opponents are in sight .” Importantly, the external observer should be able to provide its advice in some quasi-natural language, using terms about the specific task domain. In *This research was partially supported by ONR Grant NO001493-l-0998 and NSF Grant IRI-9002413. 694 Machine Learning addition, the advice-giver should be oblivious to the details of whichever internal representation and learning algorithm the agent is using. Recognition of the value of advice-taking has a long history in AI. The general idea of an agent accepting advice was first proposed about 35 years ago by McCarthy (1958). 0 ver a decade ago, Mostow (1982) developed a program that accepted and “operationalized” high-level advice about how to better play the card game Hearts. More recently Gordon and Subramanian (1994) created a system that deductively compiles high-level advice into concrete actions, which are then refined using genetic algorithms. However, the problem of making use of general advice has been largely neglected. In the next section, we present a framework for using advice with reinforcement learners. The subsequent section presents experiments that investigate the value of our approach. Finally, we list possible extensions to our work, further describe its relation to other research, and present some conclusions. The General Framework In this section we describe our approach for creating a reinforcement learner that can accept advice. We use connectionist Q-learning (Sutton 1988; Watkins 1989) as our form of reinforcement learning (RL). Figure 1 shows the general structure of a reinforcement learner, augmented (in bold) with our advicetaking extensions. In RL, the learner senses the current world state, chooses an action to execute, and occasionally receives rewards and punishments. Based on these reinforcements from the environment, the task of the learner is to improve its action-choosing module such that it increases the amount of rewards it receives. In our augmentation, an observer watches the learner Observer advice_,/ w..behavior Figure 1: RL with an external advisor. From: AAAI-94 Proceedings. Copyright © 1994, AAAI (www.aaai.org). All rights reserved.
منابع مشابه
Twelfth National Conference on Arti cial Intelligence ( AAAI - 94 ) . Incorporating Advice into Agents that Learn from Reinforcements
Incorporating Advice into Agents that Learn from Reinforcements Richard Maclin Jude W. Shavlik Computer Sciences Dept., University of Wisconsin 1210 West Dayton Street Madison, WI 53706 Email: fmaclin,[email protected] Abstract Learning from reinforcements is a promising approach for creating intelligent agents. However, reinforcement learning usually requires a large number of training epis...
متن کاملBuilding Intelligent Agents for Web - Based Tasks : A Theory - Re
We present and evaluate an infrastructure with which to rapidly and easily build intelligent software agents for Web-based tasks. Our design is centered around two basic functions: ScoreThis-Link and ScoreThisPage. If given highly accurate such functions, standard heuristic search would lead to eecient retrieval of useful information. Our approach allows users to tailor our sys-tem's behavior b...
متن کاملIntelligent Agents for Web - based Tasks : An Advice - Taking
We present and evaluate an implemented system with which to rapidly and easily build intelligent software agents for Web-based tasks. Our design is centered around two basic functions: ScoreThisLink and ScoreThisPage. If given highly accurate such functions, standard heuristic search would lead to eecient retrieval of useful information. Our approach allows users to tailor our system's behavior...
متن کاملIntelligent Agents for Web-based Tasks: An Advice-Taking Approach
We present and evaluate an implemented system with which to rapidly and easily build intelligent software agents for Web-based tasks. Our design is centered around two basic functions: ScoreThisLink and ScoreThisPage. If given highly accurate such functions, standard heuristic search would lead to efficient retrieval of useful information. Our approach allows users to tailor our system’s behavi...
متن کاملSimultaneously Learning and Advising in Multiagent Reinforcement Learning
Reinforcement Learning has long been employed to solve sequential decision-making problems with minimal input data. However, the classical approach requires a large number of interactions with an environment to learn a suitable policy. This problem is further intensified when multiple autonomous agents are simultaneously learning in the same environment. The teacher-student approach aims at all...
متن کامل